Skip to content

feat(stt): warm audio pipeline on prepare and retain between recordings#247

Open
chriswritescode-dev wants to merge 5 commits into
mainfrom
forge/stt-warmup
Open

feat(stt): warm audio pipeline on prepare and retain between recordings#247
chriswritescode-dev wants to merge 5 commits into
mainfrom
forge/stt-warmup

Conversation

@chriswritescode-dev
Copy link
Copy Markdown
Owner

Add prepare() method to AudioRecorder to warm the audio context and worklet before first recording. Retain the audio pipeline between recordings (only the mic track is stopped after each use). Full resources are released via dispose() when STT is disabled or the component unmounts.

  • Add AudioRecorder.prepare() to pre-warm audio context and worklet
  • Keep audio context alive across recordings; only stop mic track between uses
  • Add AudioRecorder.dispose() for full cleanup of all resources
  • Refactor cleanup()cleanupRecording(closeAudioContext) to distinguish partial vs full cleanup
  • Fix race condition: dispose() during async start() prevents entering recording state
  • Update useSTT to use disposeAudioRecorder on unmount and timeout
  • Document the performance improvement in docs/features/stt.md
  • Test coverage for prepare, reuse across recordings, lifecycle cancellation

Files

  • docs/features/stt.md
  • frontend/src/hooks/useSTT.ts
  • frontend/src/hooks/useSTT.test.tsx
  • frontend/src/lib/audioRecorder.ts
  • frontend/src/lib/audioRecorder.test.ts

- New VoiceActivityDetector class: adaptive noise floor, per-frame RMS
  classification, trailing-silence endpointing, configurable thresholds
- AudioWorkletProcessor computes RMS in existing flush loop and posts
  {samples, rms} instead of a bare Int16Array
- AudioRecorder integrates VAD via handleVadFrame() in both worklet and
  ScriptProcessor paths; auto-calls stop() on trailing silence; silence
  gate in processRecording() fires onNoSpeech instead of sending empty
  audio to the transcription API (fixes Whisper hallucination bug)
- useSTT wires setOnNoSpeech to reset isProcessing/isRecording cleanly
  without surfacing an error
- computeRms() exported helper for ScriptProcessor fallback path
stopRecording() called recorder.stop() before setIsProcessing(true). For
a silent take the recorder synchronously fires onNoSpeech (which sets
processing false), so the subsequent setIsProcessing(true) won and the
voice overlay was stuck in the processing state forever.

Set processing before stop() so the recorder callback (onNoSpeech or
onDataAvailable) has the final say on the processing flag.
…icks

Two remaining stuck paths after the silent-stop fix:

- processRecording() early-returned without firing onNoSpeech when a take
  had no captured audio (e.g. a quick start/stop under one worklet flush,
  totalSamples === 0). Now any non-emitting path (empty audio or no
  detected speech) fires onNoSpeech, while a genuine abort still returns
  silently.
- stopRecording() set isProcessing true even when the recorder was no
  longer recording (double-stop or post auto-stop), where stop() skips
  processRecording entirely and nothing clears the flag. Guard the
  external path on recorder state before entering processing.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant